This report enables NESO to better understand and forecast long-term electricity demand. Using ordinary least squares, we identified key temporal and weather based predictors that influence average daily demand. Valuing simplicity, we chose our model both for practical use and predictive ability.
>>>>>>> origin/mainPrior to fitting any models we examine the dataset used in our analysis. We consider the following predictor variables for our linear model:
We visually inspected scatter plots of each predictor against demand to correlations and potential anomalies.
Wind generation showed no clear correlation with demand and was therefore excluded from further modelling as it provides no predictive value. In contrast, solar generation exhibited an inverse relationship with demand. This aligns with expectations, as increased sunlight reduces the need for lighting, lowering electricity demand.
Temperature variables all showed a negative correlation with demand. This is intuitive since as warmer conditions generally reduce heating needs. TE shows the strongest negative effect and is considered the best temperature measure because it incorporates a lagged effect that may reflect customers updating heating settings with a delay.
The DSN variable follows a quadratic distribution, but has a pronounced dip in December. This suggests that an interaction between month and DSN should be considered in the model.
Month has a clear effect on demand, with the highest levels observes in January and lowest in November and March. Day of the week is an important predictor to include in the model as demand is lower on weekends, highest on weekdays. However, combining weekdays and weekends reduce the resolution of important … differences. Year also has an impact on demand and will be included as a factor variable. However, this will be stripped out of the model by NESO, who will add their own year effect for future years.
=======The figures and observations from our data exploration are recorded below.
We aim to fit a ordinary least-squares linear regression model of the form \[\mathbf{y} = X \boldsymbol{\beta} + \boldsymbol{\epsilon}\] assuming
Variable significance is assessed using hypothesis test on model parameters, where a high p-value indicates no statistically significant effect in the model. The \(R^2\) metric measures the proportion of variance explained by the linear model. Models with a higher adjusted \(R^2\) are preferable over models with a lower one, as this indicates that the model explains a higher proportion of the variability within the data. Predictive accuracy is assessed through root mean squared error (RMSE). It measures the average magnitude of the residuals, with lower values indicating that the model’s predictions are closer to the observed data. The AIC is used to compare the quality of a model balancing model fit with complexity by penalising the number of parameters included in the model. A model with the lowest AIC is generally preferred, as it explains the data well without including unnecessary parameters.
Based on the exploratory data analysis, the final model was selected to capture the most important predictors and accounting for non-linearities and interactions while balancing simplicity:
\[\begin{eqnarray*} M_F: y_i = \beta_0 + \beta_1 \text{solar}_i + \beta_2 \text{wdaynumber}_i + \beta_3 \text{month}_i + \beta_4 \text{year}_i + \beta_5 \text{DSN}^2_i + \beta_6 \text{TE}_i + \beta_7 \, <<<<<<< HEAD \text{DSN}^2\text{:month} + \epsilon_i ======= \text{DSN_i}^2\text{:month_i} + \epsilon_i >>>>>>> origin/main \end{eqnarray*}\] where \(\epsilon_i \overset{\mathrm{iid}}\sim \mathcal{N}(0, \sigma^2)\).
The TE variable was selected as the temperature measure
because it incorporates information from both the current and previous
day, capturing lagged effects in demand. When compared to alternatives,
the model using TE produced a higher \(R^2\), lower RMSE and lower AIC indicating
<<<<<<< HEAD
a better overall model fit and predictive performance.
DSN variable exhibited a quadratic relationship with
demand, so \(DSN^2\) was included.
Month and Year were included as factor
variables to account for annual variation. A distinct dip in demand in
December motivated the inclusion of an interaction between
DSN^2 and Month, allowing the model to capture
monthly variations in the quadratic effect and substantially improving
the AIC (decrease of 1,846.95).
>>>>>>> origin/main
| Model | \(R^2\) | Adjusted \(R^2\) | RMSE | AIC |
|---|---|---|---|---|
| Model with temp | 0.8585733 | 0.8569565 | 1533.105 | 62062.24 |
| Model with TO | 0.8478952 | 0.8461564 | 1589.928 | 62319.91 |
| Model with TE | 0.8696681 | 0.8681782 | 1471.741 | 61773.03 |
Solar generation was included as higher solar output was associated
with lower average demand and remained statistically significant in the
final model, (\(p<0.01\)).
Day-of-the week effects were modelled as a factor to capture the
reduction weekend demand and small variations across weekdays, with all
factor levels highly significant, (\(p<0.01\)). The DSN (days
since November) variable exhibited a quadratic relationship with demand,
so \(DSN^2\) was included to capture
this non-linearity. Month and Year were
included as factor variables to account for seasonal and annual
variation. A distinct dip in demand in December motivated the inclusion
of an interaction between DSN^2 and Month,
allowing the model to capture monthly variations in the quadratic effect
and substantially improving the AIC (decrease of 1,846.95).
The remaining variables were included due to trends seen in exploratory analysis and as they are statistically significant with most p-values lower than \(p<0.01\).
>>>>>>> origin/main| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 33526.962 | 321.157 | 104.394 | 0.000 |
| TE | -529.501 | 10.654 | -49.700 | 0.000 |
| solar_sarah | -15846.613 | 1141.001 | -13.888 | 0.000 |
| factor(WeekdayNum)1 | 5596.396 | 93.053 | 60.142 | 0.000 |
| factor(WeekdayNum)2 | 6324.780 | 93.121 | 67.920 | 0.000 |
| factor(WeekdayNum)3 | 6360.136 | 93.097 | 68.317 | 0.000 |
| factor(WeekdayNum)4 | 6279.360 | 93.250 | 67.339 | 0.000 |
| factor(WeekdayNum)5 | 5464.630 | 93.218 | 58.622 | 0.000 |
| factor(WeekdayNum)6 | 1169.326 | 93.079 | 12.563 | 0.000 |
| I(DSN^2) | 0.660 | 0.041 | 16.114 | 0.000 |
| factor(Month)2 | 4825.442 | 455.153 | 10.602 | 0.000 |
| factor(Month)3 | 5306.641 | 496.149 | 10.696 | 0.000 |
| factor(Month)11 | 3286.746 | 266.867 | 12.316 | 0.000 |
| factor(Month)12 | 9149.686 | 289.650 | 31.589 | 0.000 |
| factor(Year)1992 | -34.595 | 226.488 | -0.153 | 0.879 |
| factor(Year)1993 | -26.648 | 226.925 | -0.117 | 0.907 |
| factor(Year)1994 | 453.790 | 227.098 | 1.998 | 0.046 |
| factor(Year)1995 | 1502.790 | 226.743 | 6.628 | 0.000 |
| factor(Year)1996 | 2365.578 | 227.285 | 10.408 | 0.000 |
| factor(Year)1997 | 2804.758 | 227.065 | 12.352 | 0.000 |
| factor(Year)1998 | 3134.583 | 227.123 | 13.801 | 0.000 |
| factor(Year)1999 | 3779.208 | 226.878 | 16.657 | 0.000 |
| factor(Year)2000 | 4435.593 | 226.786 | 19.558 | 0.000 |
| factor(Year)2001 | 5206.297 | 226.702 | 22.965 | 0.000 |
| factor(Year)2002 | 5574.093 | 227.401 | 24.512 | 0.000 |
| factor(Year)2003 | 6347.567 | 226.968 | 27.967 | 0.000 |
| factor(Year)2004 | 6576.452 | 226.678 | 29.012 | 0.000 |
| factor(Year)2005 | 5704.218 | 226.901 | 25.140 | 0.000 |
| factor(Year)2006 | 5376.686 | 226.954 | 23.691 | 0.000 |
| factor(Year)2007 | 4761.861 | 227.234 | 20.956 | 0.000 |
| factor(Year)2008 | 4100.558 | 226.620 | 18.094 | 0.000 |
| factor(Year)2009 | 2600.179 | 226.727 | 11.468 | 0.000 |
| factor(Year)2010 | 2864.807 | 228.077 | 12.561 | 0.000 |
| factor(Year)2011 | 2258.601 | 227.157 | 9.943 | 0.000 |
| factor(Year)2012 | 1629.380 | 226.662 | 7.189 | 0.000 |
| factor(Year)2013 | 1119.108 | 227.056 | 4.929 | 0.000 |
| factor(Year)2014 | 488.905 | 227.196 | 2.152 | 0.031 |
| I(DSN^2):factor(Month)2 | -0.708 | 0.053 | -13.357 | 0.000 |
| I(DSN^2):factor(Month)3 | -0.728 | 0.047 | -15.418 | 0.000 |
| I(DSN^2):factor(Month)11 | 0.502 | 0.225 | 2.231 | 0.026 |
| I(DSN^2):factor(Month)12 | -3.808 | 0.079 | -48.028 | 0.000 |
To assess model performance, we first compare the \(R^2\), adjusted \(R^2\), AIC and RMSE of the final model to alternatives. In particular, we compare against
Model with no interactions: \[y_i = \beta_0 + \beta_1 solar_i + \beta_2 wdaynumber_i + \beta_3month_i + \beta_4 year_i + \beta_5 DSN^2_i + \beta_6 TE_i + \epsilon_i\] and Model with all interactions: \[y_i = \beta_0 + \beta_1 solar_i + \beta_2 wdaynumber_i + \beta_3month_i + \beta_4 year_i + \beta_5 DSN^2_i + \beta_6 TE_i + \beta_7 TE:solar + \beta_8 TE:month + \beta_9 solar:month + \beta_{10} wdaynumber:month + \beta_{11} DSN^2:month + \epsilon_i\]
Here we can see that whilst the model with all interactions had the higher \(R^2\), higher adjusted \(R^2\) and lower RMSE, our final model with only one interaction term is still significantly better than the no interactions model and thats why we chose it as explained above. [Make this paragraph actually different from above maybe comparing the scores more between different models]
To continue checking our model’s performance we now examine the residuals to see if the underlying assumptions of the model are satisfied. This will help see whether the model provides a valid fit for our data.
The
Residuals vs Fitted plot shows little change of variance with mean, so
the constant variance assumption is upheld. The Q-Q Residuals plot the
ordered standardised residuals against quantiles of a standard normal.
The plot shows a slight deviation from normality with an S-shaped
pattern which would suggest skewness in the residuals. However this
impact is most likely due to sample size so by the central limit
theorem, the sampling distribution will tend to normality even with this
deviation. The values on the Scale-Location plot shows very slight
heteroscedasticity (the variance of errors is not constant) because the
red line has a very minor curve to it. However this won’t be a major
problem to our model as we have a larger sample size and look to predict
using our model and not use it for inference. Finally, the residuals vs
leverage plot gives us Cook’s distance, which measures the change in all
model fitted values on omission of the data point in question. Here, it
is not problematic as none of our points either reach Cook’s distance or
have a high leverage so we can conclude that there are no outliers that
need investigating for our model.
We examine the residuals to check whether the underlying assumptions are satisfied, ensuring the model provides a valid fit to the data.
The
Residuals vs Fitted plot shows that the variance remains fairly
constant, supporting the homoscedacity assumption. The Q-Q plot shows a
slight deviation from normality with an S-shaped pattern suggesting
slight skewness. However, given the large sample size, the central limit
theorem ensures the sampling distribution remains approximately normal.
The Scale-Location plot shows minimal heteroscedasticity with only a
slight curve in the line. This is not concerning given the large sample
size and predictive focus. Finally, the residuals vs leverage plot
indicates no problematic points, as none exhibit high leverage or Cook’s
distance suggesting no outliers.
To assess model generalizability, we used expanding window cross-validation. Since, our model includes year as a factor, which would be updated by NESO annually based on their estimated year effect, we altered our cross validation method as follows:
Firstly we fit the model on an initial training set from 1991 up to 2000. Then the year effect is removed from the trained model. Using this, we compute the ‘yearless predictions’ for 2001. To mimic adding back in a year effect we assumed a known year effect, using the coefficient estimated from the full dataset. This was added to the ‘yearless predictions’ to compute our final predictions. We then compare these with the observed data for 2001 and obtain the following metrics:
Squared error: \(\text{SE} = (y - \hat{y}_F)^2\)
Interval score: \(\text{IS}(\alpha) = U_F - L_F + \frac{2}{\alpha} (L_F - y) \mathcal{1}\{y \leq L_F\} + \frac{2}{\alpha} (y - U_F) \mathcal{1}\{y > L_F\}\) where \(L_F\) and \(U_F\) denote the lower and upper bound of the prediction interval with coverage probability \(\alpha\).
Dawid-Sebastiani: \(\text{DS} = \frac{(y - \hat{y}_F)^2}{\sigma^2_F} + \log(\sigma^2_F)\)
We then repeat this process, expanding the training set by one year and moving the test set forward one year.
The observed vs predicted values from our cross validation are <<<<<<< HEAD plotted in Figure 2. The points lie scattered around the \(y=x\) line indicating accuracy in predictions. It is important to note that there are some points that stray above the \(y=x\) line. However these occur at lower demand levels. Since NESO are particularly interested in accuracy at higher demand levels where shortfalls are likely to occur, this indicates that the model still remains well-suited for accurately estimating average daily demand, especially at high levels.
Using cross validation we find the following predictive scores.
======= plotted in Figure 2. The points lie scattered around the \(y=x\) line indicating accurate predictions. A few points lie above the line, however these occur at lower demand levels. Since NESO prioritises accuracy at higher demand levels where shortfalls are more likely to occur, the model remains well-suited for estimating average daily demand, particularly during periods of high demand.Using cross validation we find the following predictive scores for our final model and a basic model:
>>>>>>> origin/main| Model | RMSE | Mean Dawid Sebastiani Score | Mean Interval Score |
|---|---|---|---|
| \(M_F\) | 1462.544 | 15.58537 | 9085.693 |
| \(M_{B}\) | 1872.626 | 16.09661 | 12492.786 |
Each predictive score for the final model is lower for \(M_F\), suggesting higher accuracy in predictions as well as confidence associated with these predictions.
To evaluate potential overfitting, we compare the RMSE based on the full dataset with the RMSE from the Cross Validation:
| Fitted RMSE | Cross Validation RMSE |
|---|---|
| 1471.741 | 1462.544 |
The fitted RMSE is slightly lower than the cross-validation RMSE, likely due to smaller estimation and prediction sets in cross-validation. However, the difference is small relative to the data scale and not significant, suggesting the model does not perform significantly better on the data it was trained on compared to new data. Therefore we conclude the final model is not overfitting.
=======To see that the final model is not overfitting observe that the fitted RMSE is greater than the Cross Validation RMSE. This is unusual however could be explained due to the averaging over the 14 test sets. Also note the difference is minor relative to the scale of the data, indicating that the model performs consistently on both training and new data.
Next consider limitations to our method: by assuming the known year effect from the full model, we leak information about the test year into the predictions. We still choose this method for two reasons: Firstly, it still minimises the disruption to the chronological structure of the data. Secondly, expanding the training set at each step, incorporates more historical data helping the model capture long-term, recurring monthly and weekly trends more effectively.
>>>>>>> origin/main